Part 65: Disability Corner: Dictation Software, a Practical Demonstration
Disability Corner: Dictation Software, a Practical DemonstrationThis disability corner, unlike the previous ones, is going to be as much a demonstration of how this sort of thing works in real life as an exploration of theory. Think of it as a window into my life instead of an excuse for me to talk about my workflow, which it also is.
So, anyway, dictation software. Five-ish years ago, I managed to damage the tendons in my arms through repeated misuse and overstress. It isnt really a standard case of tendinitis: my grip is just as strong as its always been and my movement isnt inhibited, but any extended use causes me increasingly severe pain. I can, say, type for short periods without issue, but five or so minutes of typing leaves my arms feeling wrong; if I double that, I end up in significant pain, and if I double that again I have to stop because the pains gotten so bad I cant focus on whatever Im doing. I used to be a far more active gamer, but now anything that requires rapid or frequent inputs is off-limits: no shooters, no action games, no clickers, no intensive strategy games that have you making a bunch of decisions in a row. And since my skill set consists almost entirely of my ability to write and communicate? That hasnt made it easy to find or hold down a job. At all. I lucked into a diversity program that swung me a chance to try out for a job, and I lucked out again when the program manager spotted my communication skills and took me on as an advisor even though I didnt meet the original jobs requirements. I kind of despaired when I first learned about the tendinitis, not just because it felt like my future was slipping away, but also because I treasured recreational writing. Ive spent my whole life writing for fun, it shapes how I approach the world, and I thought Id lost it. I was A bad time.
The only upside? I can now sometimes predict oncoming storms; turns out that old story about old people feeling it in their bones is true because my tendons ache more or less depending on air pressure. Hooray!
But theres a whole world of assistive technology out there, one I tapped into. I'm writing this sentence (as I have the whole LP) using a program called Dragon NaturallySpeaking. As I speak into my laptops microphone (or a set of headphones), the program converts the speech into either text or commands, depending on the situation and what program Im trying to work with. Its a complex piece of software, one with quirks and flaws and one that isnt cheap (90 bucks for the home version ). But I could not LP without it, and, honestly, its kind of been a blessing; I find it easier to write this way then by typing, in spite of where it falls short. So let me show you how it holds together by walking you through the update process.
First, I play the game.
Despite the fact that KS is specifically about disabled people, it lacks the backend support my dictation software needs to interact with it. Dictation software cant just navigate anything; just like screenreaders, it interfaces with code instead of visuals, so I cant navigate the program directly. That doesnt mean I cant interact with it at all, though.
Dragon switches between entering text and taking commands depending on context, and some of those commands work independently of whatever program youre using. Saying mouse grid divides my screen into nine numbered parts, and saying one of those numbers moves the mouse into that portion and subdivides it nine times again. I can do this as many times as I like until I reach a point where I know clicking at the center of the selection will get me something I want (like pressing a button), in which point I just say click. So if I wanted to open the Russian version of Wikipedia in the example above, Id say mouse grid one two seven click. I probably wouldnt say it all at once, though; the softwares intentionally biased towards natural language and precise commands like this tend to go awry pretty easy, so Id say the first two words together (to activate the command) and the next few with pauses to let the software work (which also lets me verify that Im going where I want to). I also cant just say click at the start, since thatll just write the word, and I definitely cant say mouse grid click because the command doesnt work like that; the software will interpolate whichever number it thinks most closely matches the tiny pause I left between grid and click and click there. That shouldnt be too much of an issue with KS, though.
So I open KS, say mouse grid, and it minimizes KS to show the grid .
This is something you have to get used to as you use any kind of accessibility software: its not always obvious how a given command works, so every time you try to use a new program or function youll usually run into issues. Yeah, I know that goes for a lot of programs, but good luck finding a tutorial for any given program; theres too much out there for the devs to anticipate what any given program will do, and the vast majority of programmers either dont know or dont care about accessibility enough to do anything about it. Fortunately, theres a way around this.
There are two ways to make KS go to the next line of dialogue: clicking or pressing space. We now know the first one is off-limits, but just saying space has the software interpret it as a command and advances the dialogue. Same goes for most of the games shortcuts, and since shortcuts save you ridiculous amounts of time when you enter commands, youll probably quickly grow proficient in them anyway. The problem? There arent any shortcuts that directly save your game, and without the ability to click on choice, no matter what you do youll never get past that first choice when you get into the school. And so, were reduced to using the mouse manually, which kind of defeats the purpose of using dictation software in the first place. Its tragic, its infuriating, but as I mentioned earlier, I still have enough used in my hands to do this manually as long as Im careful and dont go for several hours in a row.
Fun fact: speaking of things that get in the way of LPing, for sighted users (they have other options for non-sighted users) Dragon marks whatever you last said in a little yellow box. Its useful when writing but it displays over whatever program youre using, which means any screenshots I take with the software on dont look so great. I had to redo Emis Act 2 movie because I forgot to turn it off because every shot had it in the middle of the screen.
Doesnt look great.
So anyway, I have to play the game manually. Knowing that, I boot up KS, Word, my browser, the directory where the screenshots go, and Dragon.
Next, I make sure I have the KS wiki pulled up. When I started my first LP, I had to read every line off my Switch then check what I wrote for errors. That was Well, it actually wasnt as bad as you think, but it was very time-consuming. Fortunately, one of the threads most standout posters (you are credit to your country, Black Robe ) found me a script I could copy lines out of, just editing in any additional lines when I needed them. That was a narrative LP, though, which meant I needed to put in new dialogue or commentary pretty frequently, but thats not an issue here. Some extremely dedicated souls typed out the entire script for Katawa Shoujo on the obligatory fan wiki, conveniently subdivided by scene with ways to locate the right following scene depending on which choices you made. With the right scene selected, I turn off Dragons microphone so it doesn't get in my way and get started.
And now Im finished! Playing through an updates worth of content and doing basic formatting takes me maybe 1-2 hours, a little longer if I need to go back and redo something or it contains a bunch of images, music cues, and scene transitions. I took a screenshot of it and marked it up to taste, labeling the first time any important features popped up:
- (pink): Update: indicates where the update title does. Not that it changes, but, you know. On my next round of edits Ill put in the name of whatever the most prominent seen in the update was after the colon.
- (dark orange): []wiosna marks the presence of a music track change. Ill explain why I use the two-bracket thing in a bit.
- (light blue) () here we go again kids, head back to the big choice tells me I have to go back and write in some commentary on this section later; since Dragon is currently inactive, I have to manually type these cues and thus I keep them short and easy to write.
- (yellow): () tells me I have to put a screenshot in there. I used to keep track of which screenshot went where by adding a note after each pair of parens (I used [] for commentary and just entered in music links manually), but I abandoned that after discovering a technique Ill show you all in bit.
- (green): >Read my book. and its equivalents, as you probably figured out long ago, indicate choices. I took the > demarcations from something I did in an LP that never made it to the archive. I took the => demarcations that indicate whatever choice we took, regrettably, from Homestuck.
- (light blue): Plaintext is just straight narration. You could probably guess this.
- (light orange): HISAO: and its ilk, as you also could probably guess, mark lines of dialogue. Im very careful that every line of dialogue starts with the all-caps name of the speaker, a colon, and a space for reasons Ill go into later.
- (black): 4300 words is a little on the long side for one of my updates but well within acceptable parameters. I usually shoot for 4000 words an update, which works out to about 20 minutes of reading since IIRC the average English reading speed is about hundred 75 words a minute and I think 20 minutes is about respectable. Well, technically it actually works out to more than 20 minutes of reading, but much of that word count is coding that doesnt show up in the update, so I dont count it.
(Fun fact: the colors I used in that image are (the closest Paint can come to approximating) the Color Universal Design palette, a set of colors specifically designed to be distinguishable for people with just about every version of colorblindness. Feel free to use it in your own work !)
Just about all of this is done manually; while Dragon is technically still active, Ive turned off speech recognition and hidden the control bar so it doesnt get in the way of screenshots. It wont come back on anytime soon, because it would just slow down the next phase.
The three steps after setting up the script don't have a fixed order, but I usually oh fuck, had to stop and go back to take something here. I have to be careful when saying things that start with the word that can be interpreted as a command; starting a phrase with cut, for example, might go back and remove a phrase from earlier in the document, or starting a phrase with read might read me back what I just wrote or several lines above (Dragon has screenreader functionality on top of everything else). Here, I just said I usually (pause) go for (pause) putting in screenshots first, and it interpreted go for as go up four. So I wrote the next couple sentences in the middle of the last paragraph before I caught it. Frustrating.
(E: The next part is a bit outdated but I'm leaving it in for context anyway, I'll cover how it's changed a bit later)
Anyway, I usually go for putting in screenshots next. Clicking on the screenshots in the directory opens them up in Paint, which I put up with for a while, but, see the button next to the palette and word open? If you select an image, click the arrow, and pick Photos, it opens it up in an app that lets you just navigate between images using the left and right arrows. Press control-S and itll open up a window that shows you where its going to be saved with the filename selected and ready to be overwritten; go through the menu on the left to find the folder where you want to save your screenshots, give it whatever name you want, hit enter, and it will take you back to the image, that screenshot now safely in the folder you picked and the process set up to save the next image to the same place. I used to have to make notes in the word document to help me remember where to put images, but these days I can usually tell where every image has to go, especially since I name and upload screenshots in batches of nine.
Every nine images I save, I head to LPix to upload them. LPix is free for everyone with an SA account, has no image limits, and spits out everything you upload with both a full-size image for reference (SA auto-resizes it if its too big to fit on a screen) and the BBCode you need to just copy and paste it into the document. Only problem? You can only upload images one of the time. That drags things out. One time I had to do 120 images for a Shield update. That was An endeavor.
This is what it looks like when Im done adding images: not a big change, but it will certainly make a difference once we get to the posting stage. Unlike the recording stage, I could use Dragon here, but I dont. Why? Heres what it takes to get a single screenshot from my screenshot folder to the word document:
- tab over to Photos
- press left to go to the next image
- press control S to go to the save screen
- name the file (the file name starts out selected when I open it, so I dont have to navigate to it)
- press enter to save it
- tab over to LPix
- click choose file
- click the appropriate image
- press enter to select it
- click upload
- click the text I want to copy (img or timg)
- press control A to select it
- press control C to copy the text
- tab over to the word document
- press down and/or page down until I find the place the image goes
- press backspace twice to delete the parens that mark that image location
- press control P to paste the text
If I were to use Dragon, every one of those steps would have to be voiced separately, processed over a second or two, and redone if something goes wrong. And while saving and uploading images in batches drastically cut the overall number of steps down, my need to doublecheck what I saved and where its going (and fix something when I inevitably get things mixed up) pushes the command count back up. While there are ways to automate this by setting up macros in Dragon, that would involve programming, which I generally regard with the superstitious disdain. Its just easier to do it this way, comfort and efficiency be damned.
Anyway, next I handle music links and portraits by using programming wait no no dont go way, I promise this isnt as bad as youre thinking. Let me walk you through what I did. It requires zero programming knowledge and as long as you dont let your eyes glaze over in fear you can do it too.
First off, go to word and right-click anywhere that isnt a button on that banner of options that includes the name of the font youre using and such. Click customize ribbon.
You see that box that says developer? Make sure its checked and head back to the main screen.
In the banner you clicked earlier, there should be a line of options on the top that say home, insert, etc. One of them should now read developer (I circled it in orange). Click that. You see that record macro button, the one in green? Click that. Enter an indicative name (without spaces) where it tells you to name the macro, a description of what it does in the description section (in this case, it adds in portraits and music links), and click okay. Then take your hands off the keyboard, this next part is by far the most delicate.
Press control H; that will open the find and replace function. Enter text in both spaces; it doesnt actually matter what you put in there as long as its different, so just put a couple letters in there. Click replace all, click okay, enter a few different letters in each space, click replace all, click okay, and finally click close. Then go to the upper left and click on the red square that says stop recording. And youre done with this step! Now click the button labeled Visual Basic a couple buttons to the left and try not to be overwhelmed.
Theres a bunch of options in this page and it might look different from what I have, but you can ignore all of that. Just look at the text you have there. You should only have one block of text, and it should look something like this:
code:
Sub lpExampleFindReplace()
'
' lpExampleFindReplace Macro
'
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "dksfj;aldjsg"
.Replacement.Text = "asdfadsfas"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "pospjpjpjppjp"
.Replacement.Text = "opkpojpojpjpojpo"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
code:
With Selection.Find
.Text = "dksfj;aldjsg"
.Replacement.Text = "asdfadsfas"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
code:
With Selection.Find
.Text = []afternoon
.Replacement.Text = "[url=https://www.youtube.com/watch?v=dQw4w9WgXcQ]Katawa Shoujo OST Afternoon[/url]
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Now lets look at how you do portraits:
code:
With Selection.Find
.Text = vbCr & vbCr & "MISHA:"
.Replacement.Text = vbCr & vbCr & "[img]https://lpix.org/4021107/misha head.png[/img]" & vbCr & "MISHA:"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
code:
Sub lpExampleFindReplace()
'
' lpExampleFindReplace Macro
'
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = []afternoon
.Replacement.Text = "[url=https://www.youtube.com/watch?v=dQw4w9WgXcQ]Katawa Shoujo OST Afternoon[/url]
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = vbCr & vbCr & "MISHA:"
.Replacement.Text = vbCr & vbCr & "[img]https://lpix.org/4021107/misha head.png[/img]" & vbCr & "MISHA:"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
quote:
[]afternoon
MISHA: Wahaha~! Fooled you~!
I could run it through what we just put together
quote:
Katawa Shoujo OST - Afternoon
MISHA: Wahaha~! Fooled you~!
And get this. Go ahead and click the link; its usually worth verifying your links go to the right place. Plus, due to how I formatted it, you can run what we put together multiple times in the same document without worrying about anything getting overwritten, easier to add or move around portions that way. Adding in more links and portraits is easy; click Visual Basic (to the left of Record Macro) to go to what you put together just now, copy a music link or portrait subsection that looks like what I showed you above, paste it between the end of the last subsection and End Sub (make sure you keep everything indented in the same way!), and switch out the words and links as appropriate.
But theres one last step to go before were done. Go all the way up to the part at the top of the text where it says Sub (whatever you named it). Type Public in front of it. That way, every document on your computer can use what you just put together; otherwise youd have to copy and paste the whole damn thing every time you open the new document, which is not ideal. And then youre done!
E: With some advice from the thread, I was able to update my image uploading process for efficiency.
Explopyro posted:
I'll do my best, for better or worse. Here's the way I've been doing it with [url= https://getsharex.com/]ShareX[/url].
To set up ShareX to upload things to LPix (you only need to do this once):
1. Go to this page and copy all of the text in the white box to your clipboard.
2. Open ShareX.
3. Go to Destinations -> Image Uploader and pick Custom Image Uploader.
4. Go to Destinations -> Custom Uploader Settings.
5. Click Import -> From Clipboard. (On the left panel, partway down.) This should add lpix as a custom uploader.
6. On the right side, you should see the username, password, gallery fields. Fill these in with your credentials appropriately (for your LPix account, not your forums account). I usually make a new gallery for each update (and change the setting here before I upload), but organise things how you like; if you're just using the default gallery, leave Default here.
7. Close the window. You should now be ready to upload images.
To upload images:
Upload -> Upload File (to upload a single image)
Upload -> Upload Folder (batch upload an entire directory at once). It will ask you to confirm ("do you want to upload 100 images?") before starting.
Once the images are uploaded, they'll show up in the main window, you can right-click them there and pick "copy URL" to get the link to the image. Although what I usually do is just open the LPix gallery in my browser and copy the URLs from there, I name them sequentially so I can just go down the list and paste them into the appropriate location in my post.
I hope that broke it down enough to be helpful, maybe?
The program is more powerful and does other things than this, but this is all I need from it, so I haven't really experimented with other features. When my average update is 150-200 images, it isn't really practical to do them individually.
If you're only doing 15ish images per post and your current methodology is working for you, you might not want to go through the hassle, but IMO it's worth it at any point beyond single-digit numbers of images.
I learned about ShareX long before I started this LP, but while Ive tried to use it with past LPs, a mixture of unclear instructions, difficulty using Dragon and/or my poor, abused tendons to experiment until I got it working, and inertia since my previous method worked kept me away. But now it works! Except when I upload images like this I get links that dont have BBCode, which would mean Id have to go through and add it to every image manually if I didnt have access to this:
code:
With Selection.Find
.Text = vbCr & vbCr & "https://lpix.org"
.Replacement.Text = vbCr & vbCr & "[img]https://lpix.org"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = ".png" & vbCr & vbCr
.Replacement.Text = ".png[/img]" & vbCr & vbCr
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Uh, okay, that was a detour, but notice how I didnt mention dictation software again. I could have used Dragon to write all of this code; in fact, coding with text-to-speech software is a whole thing that people give talks on. Unfortunately, getting that functionality working requires at least some programming experience just to get it off the ground, and what I just showed you is the extent of my programming knowledge. In order to use that programming functionality, I need to understand programming, and in order to understand programming, I have to go out and learn how to do it using interfaces that ARENT text-to-speech friendly. I just do not have the time, energy, and physical condition to do that.
Okay, anyway, heres what it looks like once weve run everything through. Now all we have to do is add the commentary, and finally, FINALLY its time for Dragon to shine. See, heres the thing about how I write: like many people, I do it best in motion. For years, I would stand up, pace for a few seconds, and sit back down to type a couple sentences, rinse and repeat for an hour maybe. But with dictation? No sitdown phase, no time lapse between me thinking and entering those thoughts, no physical exhaustion caused by standing up and sitting down. Between the time it takes for the program to process my speech and my need to periodically go back and correct mistakes, I can only enter maybe 150 words a minute, half as much as I can type. But I can also go much, much longer. Before Dragon, I remember 600 words a day was about my limit; anymore would start pushing my body and my brain. Last winter I pushed out 1500 words a day, every day, for four months: by the end there was enough original text in my Shield LP to fill a full novel. I used to think it was impossible for me to write long-form anything. Now I know thats not the case, and Ive written more in the last year and a half than in the rest of my life combined. 10 years ago, the few hundred words it takes to write the commentary for this update would have been about as much as I could handle over the course of a day without pushing myself. Now its something I can do almost without thinking.
So, what can we put together out of all of this? I think we have a couple takeaways. First, as weve talked about before, the world tends to ignore disability issues, and resources for them, when they exist, are often expensive, limited, and hard to use. Between the program and various pieces of equipment, Ive easily blown 250 bucks just getting the software to the point where I can use it comfortably, and thats not the kind of money many of us can spare. Even with it fully functional, there are certain things it just cant handle (which closes off big chunks of the Internet from me), and of the stuff I can use, lots of it is awkward, time-consuming, or requires technical proficiency I may not be equipped to learn. I mean, there are ways I could handle just about everything I mentioned with the right plug-ins and equipment, but that would require finding the right stuff, spending money on it, and sinking enough time into it to get everything working. When faced with that wall of inconvenience, its easier to ignore the software sometimes, even if it sucks and defeats the point.
And on the other hand it's absolutely vital. Without dictation software, I could not have written this disability corner. Or any other disability corners, or this LP, or anything at all longer than a couple paragraphs. The dictation software constantly gets in my way, trips over itself, and fails to perform vital functions, but the difference its presence makes in my life is almost immeasurable. If you needed an example of what assistive technology can do for a person, look at the trail of LPs Ive left behind me over the last year and remember it made them possible.
As a postscript: please feel free to use everything Ive laid out here in your own efforts its not like there isnt more room for screenshot LPs out there. If you have any problems using it, get in contact with me and Ill set aside time to help walk you through it (or tell you to go to the tech support fort if I cant handle it).